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COMPARISON  OF  TWO  TREATMENTS  WHEN  THERE  MAY  BE  AN  INITIAL  EFFECT 


ELIZABETH  L.  SCOTT,  University  of  California,  Berkeley 


Abstract 

Consider  situations  where  the  treatment  may  cause  an  initial  effect 
and  may  also  cause  a  long-range  effect.  We  want  to  evaluate  the  treatment, 
or  to  compare  two  treatments,  when  the  effect  of  treatment  may  result  from 
the  two  distinct  mechanisms,  M^  and  M^.  We  may  wish  to  evaluate  M^ 
and  M2  separately,  but  we  may  also  want  to  evaluate  their  combined  effect 
M^-  Examples  are  given  and  the  general  results  are  applied  to  the  special 
case  arising  in  weather  modification  studies  and  elsewhere:  the  possible 
effects  are  multiplicative  and  the  distribution  of  nonzero  variables  is 
Gamma  with  at  most  the  scale  parameter  affected  by  treatment.  An  example 
demonstrates  that  the  two  components  may  be  too  weak  to  be  judged  significant 
while  their  sum  is  large  and  significant.  The  locally  optimum  C(a)  test  is  used. 

There  is  a  brief  discussion  of  the  power  function  of  the  tests.  The 
asymptotic  power  agrees  well,  in  general,  with  the  results  of  the  Monte 
Carlo  simulation  for  the  test  of  the  combined  effect.  If  the  zero 
values  are  discarded  and  then  Z2  employed,  there  is  large  bias  in  the 
power.  The  bias  is  more  pronounced  if  the  Wilcoxon,  Mann-Whitney  test  is 
employed.  Notice  that  the  two  effects  under  study  may  be  acting  in  the 
same  direction  or  they  may  be  in  opposition. 

TREATMENTS  WITH  TWO  MECHANISMS,  NEYMAN  C(a)  TESTS,  POWER  FUNCTION, 

GAMMA  DISTRIBUTION,  MULTIPLICATIVE  EFFECT 
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1.  Introduction 

We  consider  situations  where  the  treatment  may  cause  an  initial  effect 
and  may  also  cause  a  long-range  effect.  There  are  many  examples.  Mosteller 
(1977)  described  the  Portacaval  Shunt  Operation  "designed  to  reduce  pressure 
from  the  blood  stream  in  the  esophagus  and  thus  prevent  or  stop  hemorrhaging 
in  the  patient.  The  operation  has  a  substantial  death  rate".  Other  treatments 
and  also  nontreatment  have  a  substantial  death  rate.  The  Portacaval  Shunt 
Operation  may  (1)  affect  the  probability  of  surviving  the  initial  period  of 
treatment  and  also  may  (2)  affect  the  number  of  years  of  survival  of  those 
patients  who  do  live  through  the  operation.  The  two  effects  may  be  in  the 
same  direction  or  in  opposite  directions.  Although  both  effects  are  of 
concern  to  the  patient  and  his  physician,  the  combined  effect  is  also  important. 

We  want  to  evaluate  the  treatment,  or  to  compare  two  treatments,  when 
the  effect  of  treatment  may  result  from  two  distinct  mechanisms,  denoted 
by  Ml  and  M2,  say.  Mechanism  M^  consists  in  the  possible  modification 
of  the  probability  of  an  initial  effect  of  treatment.  The  hypothesis 
that  no  such  effect  occurs  will  be  denoted  by  H^.  Then,  mechanism  M2 
consists  in  a  change  in  the  conditional  distribution  of  the  variable  under 
study,  say  Y,  given  that  Y  >  0.  The  hypothesis  that  M2  is  not  operating 
will  be  denoted  by  H2.  We  may  wish  to  evaluate  M-j  and  M2  separately, 
but  we  may  also  want  to  estimate  their  combined  effect  M3,  the  total 
change  per  experimental  unit.  The  distinction  between  the  mechanisms 
Ml  and  M2  and  their  combined  effect  M3  is  often  ignored.  This  may 
be  unfortunate  since  the  separate  effects  of  the  initial  mechanism  and  of 
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the  long-range  mechanism  may  be  weak  and  therefore  difficult  to  detect 
while  their  combined  effect  may  be  important  and  capable  of  detection. 

On  the  other  hand,  M-j  and  M2  may  be  in  opposition  so  that  they  tend 
to  cancel  each  other.  In  this  case,  an  analysis  of  either  alone  may  be 
misleading. 

A  second  example  where  the  treatment  may  act  through  two  mechanisms 
arises  in  the  treatment  of  cancer  (and  other  diseases).  M^  could  alter 
the  probability  of  unpleasant  side  effects  that  could  force  the  patient 
to  withdraw  from  treatment  and/or  be  fatal.  M2  may  affect  the  conditional 
expected  length  of  survival,  given  that  the  patient  continues  treatment. 

As  an  illustration,  for  some  diagnoses  of  cancer,  the  standard  treatment 
is  a  harsh  chemical  program  which  some  patients  cannot  withstand.  A  new 
treatment  consists  of  the  administration  of  a  transfer  factor  designed 
to  increase  the  patient's  immunity  to  his/her  specific  kind  of  cancer. 

The  statistician  consulted  on  such  an  experiment  may  want  to  compare  the 
treatments  by  comparing  the  performance  of  all  patients  assigned  to  one 
treatment  with  the  performance  of  all  patients  assigned  to  the  other. 

However,  the  physician  may  feel  that  those  patients  who  withdrew  from 
treatment  early  in  the  experiment,  for  whateever  reason,  have  been 
administered  so  little  treatment  that  their  inclusion  would  not  be 
meaningful  and  would  tend  to  dilute  the  results.  Actually,  the  statistician 
wants  to  study  mechanism  M3  and  the  physician  wants  to  study  M2. 

In  many  examples,  the  distribution  of  survival  time  is  nonstandard. 

A  further  complication  arises  when  the  experimental  units  are  not  homogeneous. 
In  the  example  above,  the  patients  may  differ  with  respect  to  age,  sex. 
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level  of  diagnosis,  and  so  forth.  These  characteristics  of  the  unit 
may  serve  as  predictor  variables  for  the  experimental  variable  under 
study. 

Speaking  generally,  we  consider  a  randomized  experiment  with  independent 
units.  For  the  k-th  unit,  let 

X|^  =  the  predictor  variable  with  probability  density  f(x|^;A)  where 
X  and  X  may  be  vectors; 

Tj^  '  1  if  the  new  treatment  is  applied,  0  if  not,  with  Pr{T=l}  =  tt; 

=  the  experimental  variable  with  probability  density  p[y |x,9(t,5)] 
of  known  form,  vector  parameters. 

We  assume  that  the  effect  of  treatment  enters  through  5,  as  follows.  If 
9,-(t,5)  =  9,  when  T  *  0,  then  when  T  *  1  with  the  same  value  x,  we  have 

J  V 

(1-1)  0j  (t,c)  *  J  =  •••>  s. 

We  thus  have  a  triplet  (X,T,Y)  for  each  experimental  unit,  with 
probability  density,  say, 

(1.2)  4'(x,t,y)  =  7r^(l-iT)^’^f(x;A)p[y|x,9(t,5)]. 

The  hypothesis  of  no  effect  becomes  the  hypothesis  5=0*  Neyman 
and  Scott  (1967)  have  found  the  locally  optimum  test  of  class  C(a)  to 
have  as  test  criterion 


(1.3)  Z 
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where 


The  criterion  Z  is  asymptotically  Normal (0,1)  when  5=0,  and  is 
noncentral  Normal  when  5  f  0,  with  noncehtrality  parameter  equal  to 
5  multiplied  by  the  denominator  of  Z. 

Neyman  and  Scott  (1965,  1967)  noted  that  the  test  criterion  (1.3) 
does  not  depend  on  the  distribution  f  of  the  predictor  variable  X 
except  that  the  denominator  must  take  into  account  the  variability  of 
the  predictor  X  as  well  as  that  of  the  experimental  variable  Y. 

Moran  (1973)  extended  this  result. 

In  the  situation  of  this  paper,  we  may  wish  to  consider  three  problems 
separately; 

1)  We  may  want  to  estimate  5],  the  effect  of  M-] ,  or  we  may  want  to 
test  that  has  no  effect  which  would  correspond  to  5^  =  0, 

2)  We  may  want  to  estimate  effect  of  M2,  or  we  may  want  to 

test  that  M2  has  no  effect  which  would  correspond  to  ^3  =  0« 

3)  We  may  want  to  estimate  the  combined  effect  53,  or  we  may  want  to 
test  that  M3  has  no  effect  which  would  correspond  to  53  =  0. 

We  thus  employ  (1.3)  to  develop  three  test  criteria  Z-j ,  Z2,  and  Z3. 

A  case  of  wide  application  is  considered  in  the  next  section. 
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2. 


Case  of  multiplicative  effect  accompanied  by  a  Gamma  distribution 


In  many  applications,  we  can  assume  that  if  the  effect  occurs  at 
all,  it  is  multiplicative.  Often,  we  can  assume  that  the  distribution 
of  the  nonzero  variable*  is  Gamma  with  shape  parameter  unaffected  by 
the  treatment.  We  then  have  for  the  initial  mechanism 

9(Cl)  =  9(1  +  5i) 

so  that  4i  measures  the  proportional  improvement, 

(2.1)  =  [9(5])  -  9]/9. 

For  mechanism  M^,  on  combining  the  assumption  that  the  nonzero 
effect  is  multiplicative  with  the  assumption  of  a  Gamma  distributio.^  with 
at  most  the  scale  parameter  affected,  we  have 


(2.2)  PY(ylr,6)  = 


r(Y) 


y-'-' 


where  y  >  0  is  the  shape  parameter  and  6  >  0  is  the  inverse  of  the 
scale  parameter.  Under  the  assumption  that  treatment  can  affect  only 
the  scale  parameter. 


(2.3) 


=  <5(^2)  = 


,  t  =  treated  (new  treatment), 
u  =  untreated  (standard). 


The  analysis  of  weather  modification  experiments  is  an  example 
where  the  assumptions  of  multiplicative  effect  and  of  a  Gamma  distribution 
of  the  nonzero  effects  are  well  satisfied.  In  fact,  this  application  led  to 
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the  development  of  the  problem  (Neyman  and  Scott,  1967).  Earlier  analyses 
used  designs  with  comparison  areas  under  Normal  theory  (cf.  Moran,  1955) 
and  then  under  locally  optimal  C(a)  theory  (Neyman,  Scott,  and  Vasilevskis, 
1960).  Moran  showed  that  the  comparison  area  design  with  cross-over  of 
treatment  is  advantageous  when  applicable.  However,  the  effects  of  weather 
modification  appear  to  be  widespread  causing  contamination  of  the 
comparison  area.  Similar  difficulties  can  arise  in  other  types  of  application; 
we  will  restrict  attention  to  randomized  trials  on  homogeneous  units. 

Cloud  seeding,  for  example  by  the  release  of  silver  iodide  into  clouds 
in  an  effort  to  provide  nuclei  for  the  condensation  of  water  vapor,  could 
possibly  cause  precipitation  to  reach  the  ground  which  would  not  have 
fallen  otherwise  (or  conversely).  In  addition,  the  cloud  seeding  may 
increase  (or  decrease)  the  amount  of  precipitation  falling,  given  that  there 
is  some  precipitation.  Thus,  we  have  a  mechanism  M]  and  a  mechanism  M2 
that  may  be  acting  in  the  same  or  in  opposite  directions.  The  total  effect 
depends  on  the  combination  of  the  two  mechanisms. 

Meteorologists  predict  that  both  of  the  postulated  mechanisms  will  be 
multiplicative.  The  distribution  of  nonzero  precipitation  is  typically  a 
Gamma  distribution,  and  as  illustrated  in  Figure  1,  this  approximation  is 
reasonable  even  when  the  same  shape  parameter  is  employed  for  both  the  seeded 
and  the  not-seeded  experimental  units,  at  lease  for  similar  types  of  storms. 

When  the  storm  categories  may  differ,  it  is  reasonable  (Dawkins,  Neyman,  Scott, 
and  Wells,  1977)  to  assume  that  the  experimenters  can  predict  the  category 
before  treatment  starts,  and  before  the  randomized  decision  to  treat  or  not 
treat  the  storm  is  made.  For  example,  the  experimenters  can  predict  the 
duration  D  =  d|^  for  the  k-th  experimental  unit,  and  can  assume  that  its  effect 


Figure  1 


Typical  comparison  of  observed  distribution  of  nonzero  precipitation 
with  Gamma  distribution  fitted  by  maximum  likelihood  with  same  shape 
parameter.  These  data  correspond  to  the  six  stations  with  altitude 
<  1000  km  in  zone  4  of  the  Swiss  hail  experiment  Grossversuch  III. 
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enters  the  distribution  of  untreated  precipitation  only  through  the 
shape  parameter  which  can  be  approximated  linearly, 

Y(d)  =  Aq  +  A^(d  -  dp)  . 

Here,  dQ  is  the  population  mean  of  the  variable  predicted  duration, 
and  Aq,  A^ ,  and  5^  are  unknown  'nuisance  parameters'.  Notice 
that  the  differences  in  storm  types,  which  may  be  complex,  have  a 
summary  effect  on  the  distribution  of  precipitation  which  may  be 
quite  large  but  can  be  summarized  by  changes  in  the  shape  of  the 
distribution  of  nonzero  precipitation  expressed  as  a  linear  function 
of  the  predicted  duration.  Since  the  effect  of  seeding,  if  any,  is 
assumed  to  alter  only  the  scale  parameter,  we  have  that  the  conditional 
distribution  of  nonzero  precipitation,  given  the  predicted  duration, 
is  a  Gamma  distribution  with  constant  shape  parameter. 

Our  experience  indicates  that  similar  assumptions  can  be  made  in 
other  fields  of  application,  for  example  in  survival  analysis  for 
clinical  trials. 

Under  the  assumptions  and  notation  adopted,  the  expected  value 
of  the  precipitation  in  a  treated  unit  is 

(2.4)  E(Y^)  =  9(4^)  Aq  /  6(C2)  =  Aq  9  (lHi)(lH2)  /  <5^^- 

For  a  fixed  value  of  0,  that  is,  for  a  fixed  category  of  storm  types, 
the  expected  percent  effect  of  seeding,  due  to  both  mechanisms,  is 

(2.5)  Percent  effect  =  100  [(IH^  )(1+E2)  -  H  =100(^1  *  ^2  *  ' 
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The  theory  of  C(ci)  tests  refers  to  certain  limiting  situations 
where  the  number  of  observations  is  "large"  and  the  effects  of 
treatment,  such  as  S-j  and  ^2*  a'f'e  "small".  In  consequence,  we 
have  tended  to  adjust  the  test  of  to  be  particularly  sensitive 
to 


+  ^2  =  n,  say, 

neglecting  the  product  term  Cj?2‘  typical  weather  modification 
experimentation  (also  in  clinical  trials),  C-]  might  be  0.1  and 
^2  might  be  0.2  so  that  the  neglected  product  is  only  0.02. 


3.  j^pp^Mcati^ 

The  test  criterion  for  the  individual  tests  are  found  to  be 
(Neyman  and  Scott,  1967a),  using  C(a)  tests: 

For  the  hypothesis  that  =  0,  which  means  that  the 
probability  of  initial  effect  (the  probability  of  initiating 
precipitation)  is  not  altered  by  treatment,  corresponds  to  the 
familiar  chi  for  this  case: 
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Number  of  Initial  Reactions 


] 

Treated 

Untreated 

Total 

React 

"+t 

^u 

Do  not 
react 

•^ot 

"Ou 

"0. 

Total 

n 

(3.1)  %  ■  "+u  "Ot^  '  InTr(l-TT)  n^_  nQ_]^% 

where  the  notation  is  set  out  in  the  usual  table  and  it  is  the 
adopted  probability  of  treatment.  The  significance  probability  of 
is  the  two-tailed  Normal  probability,  reject  if  |Z^lsv(a), 
the  critical  value  corresponding  to  level  a. 

For  the  hypothesis  H2  that  ^2  "  which  means  that,  given 
that  the  initial  effect  is  survival  (given  that  there  is  nonzero 
precipitation),  there  is  no  effect  of  treatment  (the  scale  parameter 
in  the  Gamma  distribution  of  nonzero  precipitation  is  not  altered), 
the  test  criterion  turns  out  to  be  (Dawkins,  Neyman,  Dcott,  and  Wells,  1977) 

(3.2)  Z2  =  (ySy^  -  6(d^)l  -  y6y^  -  6(d^)l}  /  (nA^)"'^  , 

where  the  sums  are  taken  over  the  t  =  treated  and  u  =  untreated 
units  separately.  The  Ag,  A^ ,  and  6  are  solutions  of  the 


simultaneous  maximum  likelihood  equations  taken  oyer  all  nonzero  units, 
treated  and  untreated. 


E'5'[Y(d,^)l  -  n  Aq  =  ^  ’ 

5;(d|^  -  d)'f{Y(d,^)i  =  -  d)  log^y,^, 

5  *  n  Aq  /  ly^. 


with  d  =  ]]d|^  /  n,  the  grand  mean,  and  f  is  the  derivative  of  the 
Gamma  function.  Also, 

Y(d,^)  *  Aq  +  A.|(d|^  -  d). 

The  significance  probability  of  Z2  is  two-tailed  Normal  asymptotically. 

For  the  hypothesis  that  the  combined  effect  of  treatment  is 
zero,  we  have  as  noted  above  been  testing  that  ®  0«  The 

test  criterion  is  a  weighted  sum  of  and  Z^, 

(3.3)  Z3  =  (A^Z^  +  A^Z^)  /  (A^  +  A2  )^  , 
with 

a[  =  e/  (1-9)  =  n^^  /  , 

/>2  _  fl  * 

which  is  the  solution  of  the  system  of  maximum  likelihood  equations 
when  both  6^  and  6^  are  entered  as  separate  and  possibly  different 
parameters.  The  significance  probability  of  Z^  is  two-tailed  Normal 
asymptotically. 


The  application  of  the  three  test  criteria  is  illustrated  in 


Table  1,  referring  to  the  evaluation  of  hail  reports  from  the 
Grossversuch  III  hail  suppression  experiment  in  southern  Switzerland 
(Sanger  ar  aZ^  1958-64).  An  earlier  analysis  (Neyman,  Scott,  and 
Wells,  1966)  of  the  effects  of  seeding  on  rainfall  (which  is  easier 
to  observe  than  hail)  suggested  that  the  effect  of  seeding  is  positive, 
with  a  significant  increase  in  rainfall,  when  there  are  stability 
layers  in  the  atmosphere,  as  indicated  on  the  early  morning  nearest 
radiosonde  observed  at  Milan.  It  is  of  interest,  then,  to  study  the 
effects  of  hail  for  this  category  of  days.  The  results  are  shown  in 
Table  1.  The  first  rows  of  the  table  refer  to  the  category  of  days 
'without  stability  layers'  first  for  seeded  (S)  and  then  for  nonseeded 
(NS)  days.  The  next  two  rows  refer  to  days  'with  stability  layers', 
and  the  last  two  rows  to  all  days  combined.  The  first  block  of  results 
refers  to  mechanism  --  is  the  frequency  of  days  with  hail  altered 
by  seeding?  There  is  an  indication  of  an  increase  of  +54%  for  the 
category  of  days  with  stability  layers,  but  the  increase  is  not 
significant  by  the  usual  standards;  the  two-tail  significance 
probability  corresponding  to  the  test  criterion  Zi  is  only  0.093. 
There  is  no  suggestion  of  change  on  days  without  stability  layers. 

When  we  examine  the  second  mechanism  M^,  we  note  that  the  amount 
of  hail  per  day  with  hail  appears  to  be  increased  by  +47%  but  the 
effect  is  not  significant,  P  now  being  0.17  for  the  experimental  days 
with  stability  layers  on  which  there  was  hail,  as  estimated  by  the 
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asymptotic  criterion  Z2.  When  we  continue  to  the  totality  of  experimental 
days  with  stability  layers  and  use  Z3  to  evaluate  the  possible  change 
in  the  number  of  hail  reports  per  experimental  day  on  which  there  are 
stability  layers,  we  find  an  estimated  increase  of  +127%  with  significance 
probability  0.024.  Thus  the  combined  effect  of  the  two  mechanisms  is  large 
and  significant;  they  appear  to  be  acting  in  the  same  direction. 

However,  on  experimental  days  without  stability  layers,  the 
estimated  effect  is  a  small  decrease,  but  it  is  far  from  significant.  On 
all  experimental  days  whatsoever,  the  estimated  increase  is  positive 
+74%  and  P  =  0.049  significant  at  the  standard  level. 

We  thus  have  evidence  that  both  mechanisms  are  playing  a  role: 
there  is  some  evidence  of  an  increase  in  the  probability  of  hail  and, 
given  that  there  is  hail  reported,  there  is  some  evidence  of  an  increase 
in  the  number  of  hail  reports.  As  occurred  when  rainfall  was  the 
experimental  variable,  we  find  the  positive  effect  is  pronounced  on  the 
experimental  days  with  stability  layers.  Since  the  purpose  of  the  cloud 
seeding  was  to  reduce  hail  ,  it  appears  that  seeding  with  silver  iodide 
is  counter-indicated,  at  least  as  performed  in  this  experiment,  on  days 
with  stability  layers.  If  the  experiment  is  analysed  using  only  days 
with  positive  hail  reports  —  comparing  the  hail  counts  on  hail  days 
that  were  seeded  with  those  when  there  was  no  seeding  but  discarding 
the  days  with  no  hail  reports  (as  is  done  with  some  operators)  --  the 
estimated  effects  (as  shown  in  the  middle  part  of  Table  1)  would  be  much 
smaller,  not  significant,  and  possibly  misleading. 
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4.  Discussion  of  the  power  function 

The  probability  of  detecting  an  effect  when  it  exists  is  asymptotically 
noncentral  Normal  for  each  of  the  three  test  criteria.  The  power  function 
of  Z3,  the  criterion  for  combined  effect  of  the  two  mechanisms,  initial 
and  long-range,  is  of  particular  interest.  We  examine  briefly  how  this 
power  surface  depends  on  the  two  individual  effects,  on  and  on 

considered  separately, and  how  it  depends  on  their  sum  +  ^2  which 
is  an  approximation  to  the  total  effect  ^2 

mechanisms  are  acting  in  the  same  direction,  what  is  the  power  surface 
in  a  typical  example,  and  how  does  this  contrast  with  the  power  when  the 
two  mechanisms  are  acting  in  opposite  directions?  Is  the  asymptotic 
approximation  for  the  power  adequate  with  moderate  sample  sizes? 

Neyman  and  Scott  (1967c)  investigated  the  power  of  the  locally  optimum 
C(a)  test  criterion  Z2  for  detecting  a  change  in  the  effect  ^2 
to  mechanism  M2  in  a  randomized  experiment  consisting  of  100  independent 
trials,  under  the  assumptions  that  the  distribution  is  Gamma  distributed 
with  no  predictor  variables  and  that  the  treatment  effect  is  multiplicative 
changing  at  most  the  scale  parameter.  The  power  functions  of  three 
nonparametric  tests,  the  Wilcoxon,  Mann-Whitney  rank  test,  the  Kolmogorov- 
Smirnov  test  and  the  median  test,  were  studied  at  the  same  time  since  these 
tests  are  sometimes  employed.  The  studies  were  made  by  Monte  Carlo  simulation 
for  typical  cases  arising  in  weather  modification  experimentation,  such 
as  9  *  0.8  for  the  untreated  probability  that  precipitation  will  occur, 
and  Y  *  0.6,  (J  *  1.0  as  the  untreated  parameters  in  the  Gamma  distribution. 
With  n  =>  100  experimental  units,  the  power  was  discouragingly  low  for  all 
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four  tests.  Even  with  level  of  significance  0.10,  the  probability  of 
detecting  a  multiplicative  effect  of  1.5,  corresponding  to  an  increase 
of  50%,  was  slightly  less  than  0.6  for  the  locally  optimum  test, 
and  is  lower  still,  about  0.45,  for  the  Wilcoxon,  Mann-Whitney  test, 
and  even  smaller  for  the  Kolmogorov-Smirnov  and  the  median  tests,  a 
little  more  than  0.3  and  a  little  less  than  0.3,  respectively.  In  these 
studies,  the  ordering  of  the  power  functions  of  these  four  tests  was 
retained. 

In  the  Monte  Carlo  studies  reported  here,  we  have  continued  comparisons 
with  the  Wilcoxon,  Mann-Whitney  test,  labelled  U  and  drawn  with  a  dashed 
line  in  the  figures.  Figure  2  gives  a  comparison  of  the  Monte  Carlo 
power  of  the  locally  optimum  test  criterion  for  testing  the  effect 
per  experimental  unit  (solid  line)  as  a  function  sum  +  ^2 
asymptotic  theoretical  power  (dotted  line).  In  each  panel  the  value 
of  Z]  IS  fixed  so  that  across  a  panel  the  value  of  ^2  increasing, 
negative  at  the  left  of  the  panel  and  positive  at  the  right,  with  the 
point  of  changeover  through  zero  shifting  as  is  increased.  The  case 

considered  is  similar  to  that  in  the  earlier  paper  except  that  200 
experimental  units  are  considered  in  the  randomized  trials  since  we  now 
know  that  at  least  200  trials  are  needed  to  achieve  a  reasonable  experiment. 
The  asymptotic  power  function  provides  a  reasonable  approximation  for 
practical  purposes  except  in  those  categories  where  Ci  is  quite  negative 
when  the  asymptotic  power  is  too  high  especially  when  ^2  large  positive. 

Figure  2  also  shows  the  power  function  of  the  criterion  I2  for 
comparison  since,  as  noted  above,  some  evaluations  of  the  experiment  have 
been  made  using  only  the  positive  observations.  Unless  is  near 

zero  (the  center  panel),  the  disagreement  with  the  power  functions  of 
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is  pronounced.  In  particular,  the  power  function  of  Z2  has  its 
minimum  when  ^2  ''s  zero,  not  when  C-j  +  ^2  "'s  zero  so  that,  unless 
=  0,  the  use  of  Z2  to  test  for  combined  effect  produces  a  test  bias 
which  can  be  very  large.  When  is  negative,  Z2  has  little  chance 

of  detecting  that  the  combined  effect  is  not  zero  when  it  is  in  fact 
negative,  but  has  probability  of  several  times  the  level  of  significance 
of  finding  that  the  effect  is  nonzero  when  it  is  actually  zero.  When  the 

combined  effect  is  positive,  the  power  continues  high.  When  "is 

positive,  the  power  function  of  the  criterion  Z2  is  a  reflection  of 
that  just  described.  We  thus  conclude  that  when  M-j  and  M2  are  acting 
in  the  same  direction,  so  that  ^  and  C2  •^^‘ve  the  same  sign,  the 
Z2  test  criterion  has  very  little  chance  of  detecting  that  the  combined 
effect  is  not  zero,  even  when  the  total  of  the  two  effects  is  quite  large. 
However,  when  the  two  mechanisms  are  acting  in  opposite  directions,  the 
power  of  Z2  is  greater  than  that  of  Z^.  Unfortunately,  this 
phenomenon  persists  even  when  the  combined  effect  is  zero,  making  the 
test  invalid  unless  C-j  =  0  also. 

The  power  function  of  the  Wilcoxon,  Mann-Whitney  is  even  more  bizarre. 

As  indicated  by  the  short-dashed  lines,  the  test  bias  is  large  unless  C-j 

is  near  zero  in  which  case  the  power  function  is  much  lower  than  that  of 

competing  tests.  When  C-]  and  ^2  opposite  signs,  the  power  of 

the  U  test  tends  to  be  very  low,  approximately  the  level  of  significance. 

When  the  mechanisms  are  in  the  same  direction  the  power  increases  but 
this  is  not  helpful  since  in  just  these  categories  the  U  test  is  very 
invalid,  with  a  large  probability  of  finding  a  nonzero  effect  when  none  exists. 
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We  would  like  a  method  for  estimating  the  effectiveness  of  the 
combined  mechanism,  or  for  testing  that  the  effect  is  zero,  that  is 
more  powerful  than  the  test  criterion  Z^.  Several  former  colleagues 
in  the  Statistical  Laboratory  including  Barry  and  Kang  Ling  James,  S. 
Odoom,  and  Paul  Wang  are  investigating  these  problems.  Their  studies 
are  not  yet  completed  and  will  be  reported  elsewhere. 
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Figure  1 

Typical  comparison  of  observed  distribution  of  nonzero  precipitation 
with  Gamma  distribution  fitted  by  maximum  likelihood  with  same  shape 
parameter.  These  data  correspond  to  the  six  stations  with  altitude 
<  1000  km  in  zone  4  of  the  Swiss  hail  experiment  Grossversuch  III. 


Figure  2 

Power  function  for  several  tests  that  the  combined  effect  per  experimental 
unit  is  zero.  Comparison  of  the  asymptotic  theoretical  power  for  with 
Monte  Carlo  simulated  power  for  Z^,  for  Z^.  and  for  Wilcoxon,  Mann-Whitney 
for  fixed  values  of  the  initial  effect  and  increasing  values  of  the  sum 

of  the  two  effects  (and  thus  increasing  values  of  the  long-range  effect  ^2^' 


